Final Project DSA 301: Buffalo Crime Incidents

Nguyet Que Tran

Abstract

The crime situation, the safety of the area and the living environment are important issues. Through analyzing and visualizing Buffalo's crime data, the project provides a better view and insight into the current social situation in the area such as common crime types or dates and time frames, or the neigborhood often occurs criminal activities. Not only providing knowledge, this project also helps viewers to wake up and raise their vigilance.

Questions

  1. Which types of crime are mostly happend?
  2. What days crime mostly happened?
  3. What time of day crime mostly happened?
  4. Does crime occur more at weekend/night? What types of crime are happened more at weekend/night?
  5. What area/neighborhoods are high/low rate of crime?
  6. Are there any addresses that have more than 2 crimes occured?

By working with spatial attributions, this project focus on building customized analytical modules for processing and analysis of geospatial data. The goals of this project is to provide information about crime's locations in Buffalo by geospatial mapping such as 2D map, interactive point frequency maps, and interactive point distribution maps.

  1. Mapping crime locations by different conditions such as Crime Types, Neighborhoods.
  2. How are locations and frequencies of theft cases different to location and frequencies of homicide crime cases in Buffalo?
  3. Mapping that showing the number of confirmed crime cases by Buffalo Council Districts? Which Council Districts is most dangeous?

About the Dataset

Source: Buffalo Open Data - Crime Incidents

This dataset is information about crime incidents of Buffalo.

The dataset was created in September 6, 2017 and was updated in February 16, 2022.

There are total 279330 records and 29 attribute fields.

Contents

I. Exploratory Data Analysis

II. Data Cleaning

III. Visualization

IV. Spatial Data

I. Exploratory Data Analysis

In [ ]:
!pip install chart_studio
Collecting chart_studio
  Downloading chart_studio-1.1.0-py3-none-any.whl (64 kB)
     |████████████████████████████████| 64 kB 1.1 MB/s 
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from chart_studio) (2.23.0)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from chart_studio) (5.5.0)
Collecting retrying>=1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from chart_studio) (1.15.0)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from plotly->chart_studio) (8.0.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (2021.10.8)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (1.24.3)
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py) ... done
  Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11447 sha256=fe128e340ef5e44f5a4bd4e9ae2426facceacd0ba92aae3a0ded0c2a66d37d22
  Stored in directory: /root/.cache/pip/wheels/f9/8d/8d/f6af3f7f9eea3553bc2fe6d53e4b287dad18b06a861ac56ddf
Successfully built retrying
Installing collected packages: retrying, chart-studio
Successfully installed chart-studio-1.1.0 retrying-1.3.3
In [ ]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from chart_studio import plotly
from chart_studio.plotly import plot, iplot
import plotly.graph_objs as go
import math

As mentioned, there are 29 columns. I just choose to read specific 12 columns that are needed for this project.

In [ ]:
# Read the dataset from url, add ?$limit=300000 to read all records
crime_url = 'https://data.buffalony.gov/resource/d6g9-xbgu.csv?$limit=300000'
crime = pd.read_csv(crime_url, usecols=['case_number','incident_datetime','parent_incident_type', 
                                                    'hour_of_day','day_of_week','address_1','city','state','location',
                                                    'latitude','longitude','neighborhood_1'])
crime.head()
Out[ ]:
case_number incident_datetime parent_incident_type hour_of_day day_of_week address_1 city state location latitude longitude neighborhood_1
0 22-1060340 2022-04-16T11:46:00.000 Theft of Vehicle 11 Saturday 0 Block JANICE ST Buffalo NY POINT (-78.815 42.723) 42.723 -78.815 UNKNOWN
1 22-1080081 2022-04-18T02:57:55.000 Assault 3 Monday 0 Block CONDON AV Buffalo NY POINT (-78.905 42.956) 42.956 -78.905 Riverside
2 22-1080284 2022-04-18T10:31:11.000 Theft of Vehicle 10 Monday 0 Block MERRIMAC ST Buffalo NY POINT (-78.828 42.953) 42.953 -78.828 University Heights
3 22-1060834 2022-04-16T21:36:42.000 Assault 21 Saturday 100 Block BECK ST Buffalo NY POINT (-78.838 42.897) 42.897 -78.838 Broadway Fillmore
4 22-1060730 2022-04-16T10:23:48.000 Theft 19 Saturday 900 Block HUMBOLDT PW Buffalo NY POINT (-78.844 42.91) 42.910 -78.844 Masten Park
In [ ]:
crime.shape
Out[ ]:
(281217, 12)
  • The data which use for this project contain 279330 records and 12 attributions.
In [ ]:
crime.dtypes
Out[ ]:
case_number              object
incident_datetime        object
parent_incident_type     object
hour_of_day               int64
day_of_week              object
address_1                object
city                     object
state                    object
location                 object
latitude                float64
longitude               float64
neighborhood_1           object
dtype: object

Limitation of the dataset: Lacking numerical data.

The only numerical data which are useful and can combine with other data is Hour of Day.

Idealy crime data: contain information about number of injured people, dead people, etc.

Check missing values

In [ ]:
# number of missing values in each columns
crime.isnull().sum()
Out[ ]:
case_number                0
incident_datetime          5
parent_incident_type       0
hour_of_day                0
day_of_week                0
address_1                 39
city                       0
state                      0
location                3930
latitude                3930
longitude               3930
neighborhood_1           995
dtype: int64

In total 279,677 cases:

  • There are 5 cases that are missed information about Incident Datetime.

  • 39 cases are missed address information.

  • 874 cases are misses neighborhood information.

In [ ]:
# Cases that do not have DateTime information
crime[crime['incident_datetime'].isnull()]
Out[ ]:
case_number incident_datetime parent_incident_type hour_of_day day_of_week address_1 city state location latitude longitude neighborhood_1
620 10-3130893 NaN Assault 0 Null 200 Block STEVENSON ST Buffalo NY POINT (-78.815 42.858) 42.858 -78.815 Seneca-Cazenovia
2887 12-3390923 NaN Assault 0 Null GRANT ST & AMHERST ST Buffalo NY NaN NaN NaN UNKNOWN
5093 14-0420506 NaN Theft 0 Null NaN Buffalo NY NaN NaN NaN UNKNOWN
7245 11-0400654 NaN Breaking & Entering 0 Null BROADWAY & BAILEY AV Buffalo NY NaN NaN NaN UNKNOWN
10018 13-0720178 NaN Theft 0 Null 1 Block PLYMOUTH AV Buffalo NY NaN NaN NaN UNKNOWN
In [ ]:
crime['parent_incident_type'].value_counts()
Out[ ]:
Theft                   122838
Assault                  57757
Breaking & Entering      53247
Theft of Vehicle         23022
Robbery                  18211
Sexual Assault            2497
Other Sexual Offense      2241
Homicide                   961
Sexual Offense             443
Name: parent_incident_type, dtype: int64
In [ ]:
crime['day_of_week'].value_counts()
Out[ ]:
Friday       42172
Saturday     41628
Monday       39914
Wednesday    39639
Tuesday      39348
Thursday     39293
Sunday       39218
Null             5
Name: day_of_week, dtype: int64
In [ ]:
crime['day_of_week'].unique()
Out[ ]:
array(['Saturday', 'Monday', 'Thursday', 'Sunday', 'Wednesday', 'Tuesday',
       'Friday', 'Null'], dtype=object)
  • There is namning error in column Day of Week. Names of day are duplicated by lower and upper cases.
In [ ]:
# Note: UNKNOWN 2877  
crime['neighborhood_1'].value_counts()
Out[ ]:
Broadway Fillmore     16128
Central               14979
Kensington-Bailey     14614
North Park            13615
Genesee-Moselle       12761
Schiller Park         11918
Elmwood Bidwell       11738
Elmwood Bryant        11293
Upper West Side       10699
University Heights    10596
West Side             10059
Kenfield              10017
Riverside              9247
Lovejoy                8804
Masten Park            8759
Lower West Side        7736
Hopkins-Tifft          7195
Delavan Grider         7172
Fillmore-Leroy         6662
Allentown              6529
Seneca-Cazenovia       6329
South Park             5933
MLK Park               5867
Parkside               5350
Fruit Belt             5268
West Hertel            5268
Black Rock             4566
Hamlin Park            4490
Pratt-Willert          4245
Grant-Amherst          4106
Ellicott               3726
Kaisertown             3482
Central Park           3319
Seneca Babcock         2996
UNKNOWN                2889
First Ward             1867
Name: neighborhood_1, dtype: int64
In [ ]:
crime['hour_of_day'].describe()
Out[ ]:
count    281217.000000
mean         11.803173
std           7.362773
min           0.000000
25%           6.000000
50%          12.000000
75%          18.000000
max          23.000000
Name: hour_of_day, dtype: float64
In [ ]:
crime['hour_of_day'].unique()
Out[ ]:
array([11,  3, 10, 21, 19, 15, 13, 20, 16, 23,  9,  1, 18, 22,  2, 14,  8,
        4, 12, 17,  7,  5,  6,  0])
  • The exact time in Incident Datetime column are converted into only 24 hour in Hour of Day column

Group crime types by neighborhood

In [ ]:
pd.set_option('display.max_rows',500)
crime.groupby(['neighborhood_1','parent_incident_type']).size()
Out[ ]:
neighborhood_1      parent_incident_type
Allentown           Assault                  904
                    Breaking & Entering      793
                    Homicide                  10
                    Other Sexual Offense      29
                    Robbery                  397
                    Sexual Assault            50
                    Sexual Offense             3
                    Theft                   3878
                    Theft of Vehicle         465
Black Rock          Assault                  932
                    Breaking & Entering      963
                    Homicide                  11
                    Other Sexual Offense      34
                    Robbery                  274
                    Sexual Assault            36
                    Sexual Offense             7
                    Theft                   1903
                    Theft of Vehicle         406
Broadway Fillmore   Assault                 3951
                    Breaking & Entering     3505
                    Homicide                 109
                    Other Sexual Offense     112
                    Robbery                 1385
                    Sexual Assault           160
                    Sexual Offense            16
                    Theft                   5407
                    Theft of Vehicle        1483
Central             Assault                 3188
                    Breaking & Entering     1206
                    Homicide                  22
                    Other Sexual Offense     127
                    Robbery                  761
                    Sexual Assault           183
                    Sexual Offense            22
                    Theft                   8676
                    Theft of Vehicle         794
Central Park        Assault                  461
                    Breaking & Entering      701
                    Homicide                   3
                    Other Sexual Offense      28
                    Robbery                  224
                    Sexual Assault            15
                    Sexual Offense             8
                    Theft                   1621
                    Theft of Vehicle         258
Delavan Grider      Assault                 2130
                    Breaking & Entering     1452
                    Homicide                  52
                    Other Sexual Offense      90
                    Robbery                  531
                    Sexual Assault            94
                    Sexual Offense            27
                    Theft                   2122
                    Theft of Vehicle         674
Ellicott            Assault                  815
                    Breaking & Entering      827
                    Homicide                   5
                    Other Sexual Offense      25
                    Robbery                  230
                    Sexual Assault            36
                    Sexual Offense             8
                    Theft                   1462
                    Theft of Vehicle         318
Elmwood Bidwell     Assault                 1229
                    Breaking & Entering     2138
                    Homicide                  16
                    Other Sexual Offense      82
                    Robbery                  641
                    Sexual Assault            68
                    Sexual Offense            11
                    Theft                   6536
                    Theft of Vehicle        1017
Elmwood Bryant      Assault                 1494
                    Breaking & Entering     1737
                    Homicide                  11
                    Other Sexual Offense      41
                    Robbery                  671
                    Sexual Assault            85
                    Sexual Offense            13
                    Theft                   6325
                    Theft of Vehicle         916
Fillmore-Leroy      Assault                 1782
                    Breaking & Entering     1208
                    Homicide                  41
                    Other Sexual Offense      65
                    Robbery                  515
                    Sexual Assault            85
                    Sexual Offense             8
                    Theft                   2322
                    Theft of Vehicle         636
First Ward          Assault                  450
                    Breaking & Entering      451
                    Homicide                   5
                    Other Sexual Offense      22
                    Robbery                   89
                    Sexual Assault            13
                    Sexual Offense             1
                    Theft                    680
                    Theft of Vehicle         156
Fruit Belt          Assault                 1162
                    Breaking & Entering      738
                    Homicide                  22
                    Other Sexual Offense      46
                    Robbery                  359
                    Sexual Assault            65
                    Sexual Offense            16
                    Theft                   2442
                    Theft of Vehicle         418
Genesee-Moselle     Assault                 3430
                    Breaking & Entering     2966
                    Homicide                 100
                    Other Sexual Offense      99
                    Robbery                 1016
                    Sexual Assault           144
                    Sexual Offense            28
                    Theft                   3833
                    Theft of Vehicle        1145
Grant-Amherst       Assault                  773
                    Breaking & Entering      931
                    Homicide                   7
                    Other Sexual Offense      30
                    Robbery                  255
                    Sexual Assault            25
                    Sexual Offense             5
                    Theft                   1728
                    Theft of Vehicle         352
Hamlin Park         Assault                 1049
                    Breaking & Entering     1043
                    Homicide                  19
                    Other Sexual Offense      42
                    Robbery                  291
                    Sexual Assault            39
                    Sexual Offense             9
                    Theft                   1526
                    Theft of Vehicle         472
Hopkins-Tifft       Assault                 1594
                    Breaking & Entering     1065
                    Homicide                   9
                    Other Sexual Offense      71
                    Robbery                  303
                    Sexual Assault            80
                    Sexual Offense            16
                    Theft                   3523
                    Theft of Vehicle         534
Kaisertown          Assault                  849
                    Breaking & Entering      722
                    Homicide                   8
                    Other Sexual Offense      40
                    Robbery                  123
                    Sexual Assault            29
                    Sexual Offense            10
                    Theft                   1400
                    Theft of Vehicle         301
Kenfield            Assault                 2439
                    Breaking & Entering     2166
                    Homicide                  44
                    Other Sexual Offense     103
                    Robbery                  713
                    Sexual Assault            98
                    Sexual Offense            13
                    Theft                   3453
                    Theft of Vehicle         988
Kensington-Bailey   Assault                 2868
                    Breaking & Entering     3054
                    Homicide                  59
                    Other Sexual Offense     108
                    Robbery                 1060
                    Sexual Assault            85
                    Sexual Offense            16
                    Theft                   6094
                    Theft of Vehicle        1270
Lovejoy             Assault                 2111
                    Breaking & Entering     1835
                    Homicide                  24
                    Other Sexual Offense      63
                    Robbery                  520
                    Sexual Assault            92
                    Sexual Offense            17
                    Theft                   3425
                    Theft of Vehicle         717
Lower West Side     Assault                 1633
                    Breaking & Entering     1230
                    Homicide                  27
                    Other Sexual Offense      61
                    Robbery                  464
                    Sexual Assault            72
                    Sexual Offense             9
                    Theft                   3734
                    Theft of Vehicle         506
MLK Park            Assault                 1693
                    Breaking & Entering     1015
                    Homicide                  39
                    Other Sexual Offense      56
                    Robbery                  450
                    Sexual Assault            73
                    Sexual Offense            14
                    Theft                   1904
                    Theft of Vehicle         623
Masten Park         Assault                 2067
                    Breaking & Entering     1695
                    Homicide                  45
                    Other Sexual Offense      63
                    Robbery                  692
                    Sexual Assault            74
                    Sexual Offense             6
                    Theft                   3297
                    Theft of Vehicle         820
North Park          Assault                 1269
                    Breaking & Entering     1601
                    Homicide                  13
                    Other Sexual Offense      52
                    Robbery                  526
                    Sexual Assault            65
                    Sexual Offense            12
                    Theft                   9274
                    Theft of Vehicle         803
Parkside            Assault                  447
                    Breaking & Entering      967
                    Homicide                   3
                    Other Sexual Offense      21
                    Robbery                  257
                    Sexual Assault            33
                    Sexual Offense             5
                    Theft                   3273
                    Theft of Vehicle         344
Pratt-Willert       Assault                 1072
                    Breaking & Entering      820
                    Homicide                  21
                    Other Sexual Offense      39
                    Robbery                  310
                    Sexual Assault            52
                    Sexual Offense             5
                    Theft                   1554
                    Theft of Vehicle         372
Riverside           Assault                 1966
                    Breaking & Entering     1949
                    Homicide                  31
                    Other Sexual Offense      92
                    Robbery                  595
                    Sexual Assault            87
                    Sexual Offense            21
                    Theft                   3779
                    Theft of Vehicle         727
Schiller Park       Assault                 2971
                    Breaking & Entering     2991
                    Homicide                  61
                    Other Sexual Offense     103
                    Robbery                 1006
                    Sexual Assault            91
                    Sexual Offense            16
                    Theft                   3517
                    Theft of Vehicle        1162
Seneca Babcock      Assault                  699
                    Breaking & Entering      664
                    Homicide                   3
                    Other Sexual Offense      24
                    Robbery                  147
                    Sexual Assault            25
                    Sexual Offense             7
                    Theft                   1145
                    Theft of Vehicle         282
Seneca-Cazenovia    Assault                 1543
                    Breaking & Entering     1173
                    Homicide                  12
                    Other Sexual Offense      78
                    Robbery                  246
                    Sexual Assault            45
                    Sexual Offense            14
                    Theft                   2724
                    Theft of Vehicle         494
South Park          Assault                 1197
                    Breaking & Entering     1061
                    Homicide                   8
                    Other Sexual Offense      57
                    Robbery                  136
                    Sexual Assault            36
                    Sexual Offense             7
                    Theft                   2959
                    Theft of Vehicle         472
UNKNOWN             Assault                  577
                    Breaking & Entering      480
                    Homicide                   3
                    Other Sexual Offense      19
                    Robbery                  182
                    Sexual Assault            25
                    Sexual Offense             1
                    Theft                   1438
                    Theft of Vehicle         164
University Heights  Assault                 1807
                    Breaking & Entering     2549
                    Homicide                  32
                    Other Sexual Offense      78
                    Robbery                  919
                    Sexual Assault            78
                    Sexual Offense            11
                    Theft                   4268
                    Theft of Vehicle         854
Upper West Side     Assault                 1958
                    Breaking & Entering     2386
                    Homicide                  36
                    Other Sexual Offense      85
                    Robbery                  940
                    Sexual Assault           104
                    Sexual Offense            13
                    Theft                   4343
                    Theft of Vehicle         834
West Hertel         Assault                  931
                    Breaking & Entering      840
                    Homicide                  10
                    Other Sexual Offense      71
                    Robbery                  245
                    Sexual Assault            51
                    Sexual Offense            12
                    Theft                   2688
                    Theft of Vehicle         420
West Side           Assault                 2174
                    Breaking & Entering     2266
                    Homicide                  37
                    Other Sexual Offense      85
                    Robbery                  684
                    Sexual Assault           104
                    Sexual Offense            14
                    Theft                   3951
                    Theft of Vehicle         744
dtype: int64

II. Data Cleaning

Drop missing values

In [ ]:
crime.dropna(how='any',inplace=True)
crime.shape
Out[ ]:
(277249, 12)

Fix naming error of column Day of Week

In [ ]:
crime['day_of_week']=crime['day_of_week'].str.upper()
In [ ]:
crime['day_of_week'].value_counts()
Out[ ]:
FRIDAY       41605
SATURDAY     41092
MONDAY       39309
WEDNESDAY    39037
TUESDAY      38783
THURSDAY     38732
SUNDAY       38691
Name: day_of_week, dtype: int64

III. Visualization

Type of crime incidents

In [ ]:
len(crime['parent_incident_type'])
Out[ ]:
277249
In [ ]:
plt.figure(figsize=(12,5))
chart = sns.countplot(y='parent_incident_type', data=crime,palette='Spectral')
# set name for the plot
chart.set_title(f'{chart.get_ylabel().capitalize()}',fontweight='bold')
# add percentages for each bar
for p in chart.patches:
    chart.text(p.get_width(),p.get_y()+0.5,'{:1.2f}%'.format(p.get_width()*100/ float(len(crime['parent_incident_type']))),ha='left')
plt.show()
  • 43.56% - Almost haft of recorded crime incidents cases that happened in Buffalo are theft cases.

  • Top crime incidents are theft, assault, and breaking and entering.

Day of Week and Hour of Day

In [ ]:
# create function to draw multiple countplots 
def plot_multiple_countplots(crime, cols,num_cols,num_rows, hue=None):
             
    fig, axs = plt.subplots(num_rows, num_cols,figsize=(20, 10))
  
    for index, col in enumerate(cols):
        i = math.floor(index/num_cols)
        j = index - i*num_cols      
        
        if num_rows == 1:
            if num_cols == 1:
                chart = sns.countplot(x=crime[col], ax=axs, hue = hue, palette='Spectral')              
            else:
                chart = sns.countplot(x=crime[col], ax=axs[j],hue = hue, palette='Spectral')                
        else:
            chart = sns.countplot(x=crime[col], ax=axs[i, j],hue = hue, palette='Spectral')         
        # rotate axis labels   
        chart.set_xticklabels(chart.get_xticklabels(), rotation=15, ha ='center')           
        # set names each countplot
        chart.set_title(f'{chart.get_xlabel().capitalize()}',fontweight='bold')              
        # add percentages on top of each bar
        for p in chart.patches:               
            chart.text(p.get_x(),p.get_height()+1,'{:1.2f}%'.format(p.get_height()*100/ float(len(crime[col]))),ha='left')
In [ ]:
plot_multiple_countplots(crime, ['day_of_week','hour_of_day'],2,1)

1. Day of week:

  • Friday, Saturday and Sunday are a little more dangerous than other days.

2. Hour of Day:

  • 12am is the time that most likely for crime incidents.
In [ ]:
plt.figure(figsize=(17,10))
chart = sns.boxplot(x='parent_incident_type', y='hour_of_day',data=crime, hue ='day_of_week' , palette='Spectral')
chart.set_title(f'Timeline of Incident Types',fontweight='bold')
plt.show()
  • Most crime cases about proverty such as Theft, Theft of Vehicle, Breaking & Entering happpen around 8 a.m and 4 p.m - the time frame of working hours.

  • Cases about interaction conflict such as Assault, Robbery, Sexual Assault and Homicide have a fluctuated time frame.

Does most crime happen at weekend?

In [ ]:
crime['Weekend'] = crime['day_of_week'].isin(['SATURDAY', 'SUNDAY'])
ax=sns.catplot(x='parent_incident_type', y='hour_of_day', hue='Weekend', kind='box', dodge=False, data=crime)
ax.fig.suptitle(f'Crime on Weekend',fontweight='bold')
ax.fig.set_size_inches(17,5)
  • The answer is YES. Most crimes occur more frequently at weekend.

Does most crime happen at night time?

In [ ]:
x = [0,1,2,3,4,5,6,20,21,22,23]
crime['Night Time'] = crime['hour_of_day'].isin(x)

plt.figure(figsize=(12,7))
chart = sns.countplot(y='parent_incident_type', data=crime,hue='Night Time')
# set name for the plot
chart.set_title(f'Crime at Night',fontweight='bold')
# add percentages for each bar
for p in chart.patches:
    chart.text(p.get_width(),p.get_y()+0.25,'{:1.2f}%'.format(p.get_width()*100/ float(len(crime['parent_incident_type']))),ha='left')
plt.show()
  • Night time in this project is from 8 p.m to 6 a.m.

  • Only Theft , Breaking & Entering, and Sexual Offense occur more at day time. Because day time, especically from 8 a.m to 4 p.m is the time frame of office working hours. People leaving for work, stay in the office are good condition for thief and intruder.

  • All other types of crime incident occur more at night time.

Neighborhood and Location

In [ ]:
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', data=crime,palette='Spectral')
# set name for the plot
chart.set_title(f'Neighborhood and Crime Cases',fontweight='bold')
for p in chart.patches:
    chart.text(p.get_width(),p.get_y()+0.5,'{:1.2f}%'.format(p.get_width()*100/ float(len(crime['neighborhood_1']))),ha='left')
plt.show()

High frequency of crime - Dangerous Neighborhoods :

  1. Broadway Fillmore

  2. Central

  3. Kensington-Bailey

  4. Noth Park

  5. Genesee-Moselle

Low frequency of crime - Safe Neighborhoods:

  1. First Ward

  2. Seneca Babcock

  3. Central Park

  4. Kaisertown

  5. Ellicott

In [ ]:
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', data=crime, hue= 'parent_incident_type',palette='Spectral')
# set name for the plot
chart.set_title(f'Neighborhood with Incident Type',fontweight='bold')
plt.show()
  • North Park is the neighborhood where incident happened in highest frequency and most cases are theft.

  • Neighborhoods that suffered from Theft: Noth Park, Broadway Fillmore, Central, Kensington-Bailey, Elmwood Bidwell and Elmwood Bryant.

WHAT-IF No-Theft?

Because 43.55% recorded cases are theft cases, so to have a closer look in other incident types that happened in different neighborhoods, this step remove all the theft cases.

In [ ]:
# Remove all Theft cases
crime2 = crime
crime2 =  crime2[crime2['parent_incident_type'].str.contains('Theft')==False]
# Draw chart
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', data=crime2, hue= 'parent_incident_type',palette='tab10')
# set name for the plot
chart.set_title(f'Neighborhood with Non-Theft Incident Type',fontweight='bold')
plt.show()
  • Without theft cases involved, neighborhoods are suffered from assault, breaking and entering.

  • High frequency of Assault: Broadway Fillmore, Genersee-Moselle, Schiller Park, Central, Kenfield and Delavan Crider.

  • High frequency of Breaking & Entering: Broadway Fillmore, Genersee-Moselle, Schiller Park, Kensington-Bailey, University Heights.

  • High frequency of Robbery: Broadway Fillmore, Genersee-Moselle, and University Heights.

  • Without theft cases involved, North Park is now no longer the most dangerous neighborhood.

In [ ]:
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', hue = 'day_of_week',data=crime,palette='Spectral')
# set name for the plot
chart.set_title(f'Day of Crime in Neighborhood',fontweight='bold')
plt.show()
  • Central neigborhood is more dangrous at weekend.

  • All days of week arlarm: Broadway Fillmore, North Park, Kensington-Bailey, Schiller Park, Genersee-Moselle and Emlwood Bidwell.

Check duplicated addresses

In [ ]:
crime.duplicated(subset=['address_1'],keep='first').sum()
Out[ ]:
256732
  • It is interesting that total records number is 279330 and duplicated address number is 257356.

=> More than 92% addresses had more than 2 crime cases in records.

In [ ]:
crime.loc[crime.duplicated(subset=['address_1'], keep='first'),:]
Out[ ]:
case_number incident_datetime parent_incident_type hour_of_day day_of_week address_1 city state location latitude longitude neighborhood_1 Weekend Night Time
47 22-1080622 2022-04-18T16:40:00.000 Theft 16 MONDAY 600 Block AMHERST ST Buffalo NY POINT (-78.816 42.94) 42.940 -78.816 Kensington-Bailey False False
56 22-1090264 2022-04-18T09:55:00.000 Theft 9 TUESDAY 2100 Block DELAWARE AV Buffalo NY POINT (-78.868 42.943) 42.943 -78.868 Parkside False False
61 22-1080286 2022-04-18T10:34:00.000 Theft 10 MONDAY 2600 Block DELAWARE AV Buffalo NY POINT (-78.874 42.955) 42.955 -78.874 North Park False False
63 22-1080888 2022-04-18T22:26:19.000 Theft 22 MONDAY 1600 Block HERTEL AV Buffalo NY POINT (-78.846 42.948) 42.948 -78.846 North Park False True
79 22-1100299 2022-04-20T10:50:30.000 Theft 10 WEDNESDAY 0 Block W WOODSIDE AV Buffalo NY POINT (-78.831 42.842) 42.842 -78.831 Hopkins-Tifft False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
281196 15-2880736 2015-10-15T17:31:00.000 Theft of Vehicle 17 THURSDAY MAIN ST & SUMMER ST Buffalo NY POINT (-78.868 42.905) 42.905 -78.868 Elmwood Bryant False False
281199 14-0060987 2014-01-06T17:00:00.000 Theft of Vehicle 17 MONDAY STRAUSS ST & BROADWAY Buffalo NY POINT (-78.841 42.893) 42.893 -78.841 Broadway Fillmore False False
281201 20-3500696 2020-12-15T19:00:00.000 Assault 19 TUESDAY MAIN ST & LISBON AV Buffalo NY POINT (-78.828 42.95) 42.950 -78.828 University Heights False False
281202 10-3270565 2010-11-23T15:50:00.000 Theft 15 TUESDAY 2500 Block BAILEY AV Buffalo NY POINT (-78.813 42.924) 42.924 -78.813 Kenfield False False
281210 12-2700238 2012-09-26T09:47:00.000 Theft 9 WEDNESDAY 700 Block BAILEY AV Buffalo NY POINT (-78.824 42.848) 42.848 -78.824 Hopkins-Tifft False False

256732 rows × 14 columns

IV. Spatial Data

In [ ]:
%%time 

!apt install gdal-bin python-gdal python3-gdal 
# Install rtree - Geopandas requirment
!apt install python3-rtree 
# Install Geopandas
!pip install git+git://github.com/geopandas/geopandas.git
# Install descartes - Geopandas requirment
!pip install descartes
Reading package lists... Done
Building dependency tree       
Reading state information... Done
gdal-bin is already the newest version (2.2.3+dfsg-2).
python-gdal is already the newest version (2.2.3+dfsg-2).
The following additional packages will be installed:
  python3-numpy
Suggested packages:
  python-numpy-doc python3-nose python3-numpy-dbg
The following NEW packages will be installed:
  python3-gdal python3-numpy
0 upgraded, 2 newly installed, 0 to remove and 41 not upgraded.
Need to get 2,288 kB of archives.
After this operation, 13.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 python3-numpy amd64 1:1.13.3-2ubuntu1 [1,943 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-gdal amd64 2.2.3+dfsg-2 [346 kB]
Fetched 2,288 kB in 0s (6,647 kB/s)
Selecting previously unselected package python3-numpy.
(Reading database ... 155514 files and directories currently installed.)
Preparing to unpack .../python3-numpy_1%3a1.13.3-2ubuntu1_amd64.deb ...
Unpacking python3-numpy (1:1.13.3-2ubuntu1) ...
Selecting previously unselected package python3-gdal.
Preparing to unpack .../python3-gdal_2.2.3+dfsg-2_amd64.deb ...
Unpacking python3-gdal (2.2.3+dfsg-2) ...
Setting up python3-numpy (1:1.13.3-2ubuntu1) ...
Setting up python3-gdal (2.2.3+dfsg-2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libspatialindex-c4v5 libspatialindex-dev libspatialindex4v5
  python3-pkg-resources
Suggested packages:
  python3-setuptools
The following NEW packages will be installed:
  libspatialindex-c4v5 libspatialindex-dev libspatialindex4v5
  python3-pkg-resources python3-rtree
0 upgraded, 5 newly installed, 0 to remove and 41 not upgraded.
Need to get 671 kB of archives.
After this operation, 3,948 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex4v5 amd64 1.8.5-5 [219 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-c4v5 amd64 1.8.5-5 [51.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/main amd64 python3-pkg-resources all 39.0.1-2 [98.8 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-dev amd64 1.8.5-5 [285 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-rtree all 0.8.3+ds-1 [16.9 kB]
Fetched 671 kB in 0s (2,437 kB/s)
Selecting previously unselected package libspatialindex4v5:amd64.
(Reading database ... 155924 files and directories currently installed.)
Preparing to unpack .../libspatialindex4v5_1.8.5-5_amd64.deb ...
Unpacking libspatialindex4v5:amd64 (1.8.5-5) ...
Selecting previously unselected package libspatialindex-c4v5:amd64.
Preparing to unpack .../libspatialindex-c4v5_1.8.5-5_amd64.deb ...
Unpacking libspatialindex-c4v5:amd64 (1.8.5-5) ...
Selecting previously unselected package python3-pkg-resources.
Preparing to unpack .../python3-pkg-resources_39.0.1-2_all.deb ...
Unpacking python3-pkg-resources (39.0.1-2) ...
Selecting previously unselected package libspatialindex-dev:amd64.
Preparing to unpack .../libspatialindex-dev_1.8.5-5_amd64.deb ...
Unpacking libspatialindex-dev:amd64 (1.8.5-5) ...
Selecting previously unselected package python3-rtree.
Preparing to unpack .../python3-rtree_0.8.3+ds-1_all.deb ...
Unpacking python3-rtree (0.8.3+ds-1) ...
Setting up libspatialindex4v5:amd64 (1.8.5-5) ...
Setting up python3-pkg-resources (39.0.1-2) ...
Setting up libspatialindex-c4v5:amd64 (1.8.5-5) ...
Setting up libspatialindex-dev:amd64 (1.8.5-5) ...
Setting up python3-rtree (0.8.3+ds-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.3) ...
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link

Collecting git+git://github.com/geopandas/geopandas.git
  Cloning git://github.com/geopandas/geopandas.git to /tmp/pip-req-build-miki3r3j
  Running command git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-miki3r3j
  fatal: remote error:
    The unauthenticated git protocol on port 9418 is no longer supported.
  Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
WARNING: Discarding git+git://github.com/geopandas/geopandas.git. Command errored out with exit status 128: git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-miki3r3j Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-miki3r3j Check the logs for full command output.
Requirement already satisfied: descartes in /usr/local/lib/python3.7/dist-packages (1.1.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from descartes) (3.2.2)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (2.8.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (3.0.8)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (1.4.2)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (1.21.6)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (0.11.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->descartes) (4.2.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->descartes) (1.15.0)
CPU times: user 287 ms, sys: 90.3 ms, total: 377 ms
Wall time: 23.5 s
In [ ]:
!pip install geopandas
Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
     |████████████████████████████████| 1.0 MB 4.9 MB/s 
Requirement already satisfied: shapely>=1.6 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.8.1.post1)
Requirement already satisfied: pandas>=0.25.0 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.3.5)
Collecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
     |████████████████████████████████| 6.3 MB 35.0 MB/s 
Collecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
     |████████████████████████████████| 16.7 MB 359 kB/s 
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (2021.10.8)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (7.1.2)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (1.15.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (57.4.0)
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (21.4.0)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (1.21.6)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2022.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2.8.2)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1
In [ ]:
import geopandas as gpd
In [ ]:
pd.set_option('display.max_columns',None) 
# Add $limit=300000 to read in all records, defalt is 1000 records.
crime_url = "https://data.buffalony.gov/resource/d6g9-xbgu.geojson?$limit=300000"
crime_gdf = gpd.read_file(crime_url)
#crime_gdf = gpd.read_file(crime_url, ignore_fields=["iso_a3", "gdp_md_est"])
crime_gdf.tail()
Out[ ]:
city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week incident_id tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary updated_at case_number census_tract_2010 incident_datetime council_district geometry
281212 Buffalo UNKNOWN UNKNOWN None Theft NY UNKNOWN Tuesday None UNKNOWN Buffalo Police are investigating this report o... UNKNOWN UNKNOWN None UNKNOWN UNKNOWN UNKNOWN UNKNOWN 9 2011-06-23T06:00:00 600 Block MINNESOTA AV UNKNOWN LARCENY/THEFT None 11-1730227 UNKNOWN 2011-06-21T09:30:00 UNKNOWN None
281213 Buffalo None None None Breaking & Entering NY None Thursday None None Buffalo Police are investigating this report o... None None None None None None None 17 2021-01-28T17:06:43 500 Block DODGE ST None BURGLARY None 21-0280538 None 2021-01-28T17:05:43 None None
281214 Buffalo UNKNOWN UNKNOWN None Theft NY UNKNOWN Friday None UNKNOWN Buffalo Police are investigating this report o... UNKNOWN UNKNOWN None UNKNOWN UNKNOWN UNKNOWN UNKNOWN 21 2019-09-24T16:08:00 HURON ST & DELAWARE AV UNKNOWN LARCENY/THEFT None 13-1380405 UNKNOWN 2013-05-17T21:30:00 UNKNOWN None
281215 Buffalo UNKNOWN UNKNOWN None Theft NY UNKNOWN Wednesday None UNKNOWN Buffalo Police are investigating this report o... UNKNOWN UNKNOWN None UNKNOWN UNKNOWN UNKNOWN UNKNOWN 10 2012-03-29T06:00:00 100 Block LINCOLN PW UNKNOWN LARCENY/THEFT None 12-0880252 UNKNOWN 2012-03-28T10:26:00 UNKNOWN None
281216 Buffalo UNKNOWN UNKNOWN None Theft NY UNKNOWN Thursday None UNKNOWN Buffalo Police are investigating this report o... UNKNOWN UNKNOWN None UNKNOWN UNKNOWN UNKNOWN UNKNOWN 7 2013-10-25T06:02:00 1 Block FOUNTAIN PA UNKNOWN LARCENY/THEFT None 13-2970827 UNKNOWN 2013-10-24T07:10:00 UNKNOWN None
In [ ]:
!pip install contextily
Collecting contextily
  Downloading contextily-1.2.0-py3-none-any.whl (16 kB)
Collecting xyzservices
  Downloading xyzservices-2022.4.0-py3-none-any.whl (36 kB)
Requirement already satisfied: geopy in /usr/local/lib/python3.7/dist-packages (from contextily) (1.17.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from contextily) (3.2.2)
Collecting mercantile
  Downloading mercantile-1.2.1-py3-none-any.whl (14 kB)
Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from contextily) (7.1.2)
Collecting rasterio
  Downloading rasterio-1.2.10-cp37-cp37m-manylinux1_x86_64.whl (19.3 MB)
     |████████████████████████████████| 19.3 MB 3.6 MB/s 
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from contextily) (1.1.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from contextily) (2.23.0)
Requirement already satisfied: geographiclib<2,>=1.49 in /usr/local/lib/python3.7/dist-packages (from geopy->contextily) (1.52)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (2.8.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (3.0.8)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (1.4.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (0.11.0)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (1.21.6)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->contextily) (4.2.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->contextily) (1.15.0)
Requirement already satisfied: click>=3.0 in /usr/local/lib/python3.7/dist-packages (from mercantile->contextily) (7.1.2)
Requirement already satisfied: click-plugins in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (1.1.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (57.4.0)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (21.4.0)
Collecting snuggs>=1.4.1
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (2021.10.8)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (0.7.2)
Collecting affine
  Downloading affine-2.3.1-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (3.0.4)
Installing collected packages: snuggs, affine, xyzservices, rasterio, mercantile, contextily
Successfully installed affine-2.3.1 contextily-1.2.0 mercantile-1.2.1 rasterio-1.2.10 snuggs-1.4.7 xyzservices-2022.4.0
In [ ]:
import contextily as ctx
%matplotlib inline
In [ ]:
crime_gdf.drop(['incident_id','updated_at'], axis=1,inplace=True)
crime_gdf.head()
Out[ ]:
city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime council_district geometry
0 Buffalo UNKNOWN UNKNOWN 42.723 Theft of Vehicle NY UNKNOWN Saturday UNKNOWN Buffalo Police are investigating this report o... UNKNOWN UNKNOWN -78.815 UNKNOWN UNKNOWN UNKNOWN UNKNOWN 11 2022-04-16T11:47:00 0 Block JANICE ST UNKNOWN UUV 22-1060340 UNKNOWN 2022-04-16T11:46:00 UNKNOWN POINT (-78.81500 42.72300)
1 Buffalo Riverside District D 42.956 Assault NY 360290058023003 Monday 005802 Buffalo Police are investigating this report o... 36029005802 3003 -78.905 3003 3 3 58.02 3 2022-04-18T03:43:55 0 Block CONDON AV 360290058023 ASSAULT 22-1080081 58.02 2022-04-18T02:57:55 NORTH POINT (-78.90500 42.95600)
2 Buffalo University Heights District E 42.953 Theft of Vehicle NY 360290046013001 Monday 004601 Buffalo Police are investigating this report o... 36029004601 3001 -78.828 4001 3 4 46.01 10 2022-04-18T10:31:11 0 Block MERRIMAC ST 360290046013 UUV 22-1080284 46.01 2022-04-18T10:31:11 UNIVERSITY POINT (-78.82800 42.95300)
3 Buffalo Broadway Fillmore District C 42.897 Assault NY 360290027031012 Saturday 002703 Buffalo Police are investigating this report o... 36029002703 1012 -78.838 4004 1 4 27.03 21 2022-04-16T21:36:42 100 Block BECK ST 360290027031 ASSAULT 22-1060834 27.02 2022-04-16T21:36:42 FILLMORE POINT (-78.83800 42.89700)
4 Buffalo Masten Park District C 42.91 Theft NY 360290033022004 Saturday 003302 Buffalo Police are investigating this report o... 36029003302 2004 -78.844 2004 2 2 33.02 19 2022-04-16T19:24:48 900 Block HUMBOLDT PW 360290033022 LARCENY/THEFT 22-1060730 33.02 2022-04-16T10:23:48 MASTEN POINT (-78.84400 42.91000)

Check the Coordinate Reference System(CRS)

CRS defines how the two-dimensional, projected map in Geographic information system (GIS) relates to real places on the earth.

Check the CRS and change it to epsg:3857 to be able to draw plots.

In [ ]:
# Check crs
crime_gdf.crs
Out[ ]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [ ]:
# Change crs
crime_gdf.to_crs('epsg:3857',inplace=True)

Check and drop missing geometry rows

In [ ]:
crime_gdf.shape
Out[ ]:
(281217, 27)
In [ ]:
orig_rows = crime_gdf.shape[0]
crime_gdf = crime_gdf.loc[crime_gdf.geometry.notnull()]
print(f'Records with missing location information = {orig_rows-crime_gdf.shape[0]}')
Records with missing location information = 3930
In [ ]:
#crime_gdf.geometry=crime_gdf.geometry.astype(float)
crime_gdf.dropna(subset =['geometry'], how='any',inplace=True)
#crime_gdf.dropna( how='any',inplace=True)
crime_gdf.shape
Out[ ]:
(277287, 27)

Delete 1395 records that is missed location information because they are not useful and cannot show on the map.

Mapping crimes by neighborhoods

In [ ]:
fig, ax = plt.subplots(figsize=(15,15), subplot_kw=dict(aspect='equal'))
crime_gdf.plot(column=crime_gdf['neighborhood_1'], ax=ax);
ax.set_title('Crime Incident Locations of Buffalo by Neighborhood',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

There are some bad geometry data that the locations are not in NY state.

So the map is so big, it is not only Buffalo area.

In [ ]:
#crime_gdf = crime_gdf.GeoDataFrame.drop(columns=['incident_id'],  axis=1, inplace=True)
crime_gdf.council_district.unique()
Out[ ]:
array(['UNKNOWN', 'NORTH', 'UNIVERSITY', 'FILLMORE', 'MASTEN', 'ELLICOTT',
       'NIAGARA', 'LOVEJOY', 'SOUTH', 'DELAWARE'], dtype=object)

There are 'UNKNOWN' council district in the dataset that it cause above problem when mapping. To solve this problem, fixing it by removing the UNKNOWN council_district.

In [ ]:
# set council_district as index of the dataframe
crime_gdf.set_index('council_district',inplace=True)
crime_gdf.head()
Out[ ]:
city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry
council_district
UNKNOWN Buffalo UNKNOWN UNKNOWN 42.723 Theft of Vehicle NY UNKNOWN Saturday UNKNOWN Buffalo Police are investigating this report o... UNKNOWN UNKNOWN -78.815 UNKNOWN UNKNOWN UNKNOWN UNKNOWN 11 2022-04-16T11:47:00 0 Block JANICE ST UNKNOWN UUV 22-1060340 UNKNOWN 2022-04-16T11:46:00 POINT (-8773645.667 5269904.214)
NORTH Buffalo Riverside District D 42.956 Assault NY 360290058023003 Monday 005802 Buffalo Police are investigating this report o... 36029005802 3003 -78.905 3003 3 3 58.02 3 2022-04-18T03:43:55 0 Block CONDON AV 360290058023 ASSAULT 22-1080081 58.02 2022-04-18T02:57:55 POINT (-8783664.421 5305276.995)
UNIVERSITY Buffalo University Heights District E 42.953 Theft of Vehicle NY 360290046013001 Monday 004601 Buffalo Police are investigating this report o... 36029004601 3001 -78.828 4001 3 4 46.01 10 2022-04-18T10:31:11 0 Block MERRIMAC ST 360290046013 UUV 22-1080284 46.01 2022-04-18T10:31:11 POINT (-8775092.820 5304820.702)
FILLMORE Buffalo Broadway Fillmore District C 42.897 Assault NY 360290027031012 Saturday 002703 Buffalo Police are investigating this report o... 36029002703 1012 -78.838 4004 1 4 27.03 21 2022-04-16T21:36:42 100 Block BECK ST 360290027031 ASSAULT 22-1060834 27.02 2022-04-16T21:36:42 POINT (-8776206.015 5296307.314)
MASTEN Buffalo Masten Park District C 42.91 Theft NY 360290033022004 Saturday 003302 Buffalo Police are investigating this report o... 36029003302 2004 -78.844 2004 2 2 33.02 19 2022-04-16T19:24:48 900 Block HUMBOLDT PW 360290033022 LARCENY/THEFT 22-1060730 33.02 2022-04-16T10:23:48 POINT (-8776873.932 5298282.947)
In [ ]:
crime_gdf.drop(['UNKNOWN'] , axis=0,inplace=True)
In [ ]:
fig, ax = plt.subplots(figsize=(15,15), subplot_kw=dict(aspect='equal'))
crime_gdf.plot(column=crime_gdf['neighborhood_1'], ax=ax);
ax.set_title('Crime Incident Locations of Buffalo by Neighborhood',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

After delete UNKNOWN council district records, the map now available show all cases in Buffalo area. Based on the map, almost every places in Buffalo have a record of crime incidents. Only the Sounth and Delaware County Districts show some blank area with no crime records. These area are parks.

Mapping by crime types

In [ ]:
crime_gdf.parent_incident_type.unique()
Out[ ]:
array(['Assault', 'Theft of Vehicle', 'Theft', 'Robbery',
       'Breaking & Entering', 'Sexual Offense', 'Homicide',
       'Other Sexual Offense', 'Sexual Assault'], dtype=object)

There are 9 types of crime in the dataset. This part is drawing plot that tell different of 2 crime types: Assault and Homicide.

In [ ]:
crime_gdf.reset_index(inplace=True)
In [ ]:
crime_gdf['conrank'] = 'lightgray'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Theft','conrank']='red'
crime_gdf.loc[crime_gdf.parent_incident_type == 'Assault','conrank']='blue'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Robbery','conrank']='purple'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Theft of Vehicle','conrank']='organce'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Breaking & Entering','conrank']='yellow'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Sexual Offense','conrank']='violet'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Other Sexual Offense','conrank']='brown'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Sexual Assault','conrank']='lime'
crime_gdf.loc[crime_gdf.parent_incident_type == 'Homicide','conrank']='deepPink'
crime_gdf.loc[~crime_gdf.parent_incident_type.isin(['Homicide','Assault']),'conrank']=='gray'
Out[ ]:
1         False
3         False
4         False
6         False
7         False
          ...  
276807    False
276808    False
276809    False
276811    False
276812    False
Name: conrank, Length: 218922, dtype: bool
In [ ]:
import matplotlib.lines as mlines

fig, ax = plt.subplots(figsize=(12,12), subplot_kw=dict(aspect='equal'))

deepPink_marker = mlines.Line2D([], [], color='deepPink', marker='.', linestyle='None',
                          markersize=10,label='Homicide')
blue_marker = mlines.Line2D([], [], color='blue', marker='.', linestyle='None',
                          markersize=10,label='Assault')
gray_marker=mlines.Line2D([], [], color='gray', marker='.', linestyle='None',
                          markersize=10, label='Other types')
ax.legend(handles=[deepPink_marker,blue_marker,gray_marker])

crime_gdf.plot(color=crime_gdf['conrank'], ax=ax)
ax.set_title('Buffalo Assault and Homicide Crime Cases',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

This map is showing the different about locations and number of cases in Assault type and Homicide type. Assault is the second most common type of crime that happended in Buffalo. Assault incidents were occurred a lot compare to Homicide incidents.

Although the quantity of cases is different, Assault and Homicide cases are both scattered occurred all around Buffalo.

Mapping Duplicated locations

There are a lot of locations that has more than 1 recorded crime cases. This part is to show the duplicated addresses on the dataset.

In [ ]:
# Total duplcatated address here is smaller than above because I did remove some rows that missing geometry 
crime_gdf.duplicated(subset=['address_1'],keep='first').sum()
Out[ ]:
256402
In [ ]:
fig, ax = plt.subplots(figsize=(15,15), subplot_kw=dict(aspect='equal'))
crime_gdf.plot(column=crime_gdf.duplicated(subset=['address_1'],keep='first'), ax=ax);
ax.set_title('>= 2 Crime Incidents Cases Locations of Buffalo',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

Yellow dots are locations of places where crime happened more than 2 times, and black dots are locations that only have 1 crime case in the dataset. There is a definitely different about quantity of these two category, more than 92% of locations had more than 2 crime cases in records.

Theft and Homicide Point Frequency Maps

Point locations represent where the actual event occurred. This approach is only viable if there are point locations with multiple occurrences of the geographic event under consideration.

Bokeh

In [ ]:
from bokeh.tile_providers import CARTODBPOSITRON, get_provider
tileProvider = get_provider('CARTODBPOSITRON_RETINA')

from bokeh.io import output_notebook, show, output_file, save
from bokeh.plotting import figure
from bokeh.models import HoverTool, GeoJSONDataSource
from bokeh.layouts import row,column
from bokeh.models.widgets import Div

output_notebook()

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
In [ ]:
kwargs = {"plot_width":800,
          "plot_height":700,
          "sizing_mode":'scale_both',
          "outline_line_color":'#046626',
          "outline_line_width":3,
          "outline_line_alpha":.3,
          'toolbar_location':'above',
          'border_fill_color':'#4287f5',
          'border_fill_alpha':.3,
          'min_border_left': 20,
          'min_border_right':20,
          'min_border_top': 10,
          'min_border_bottom':20}
In [ ]:
# Check null geometry 
orig_rows = crime_gdf.shape[0] 
crime_gdf = crime_gdf.loc[crime_gdf.geometry.notnull()]
print(f'Records with missing location information = {orig_rows-crime_gdf.shape[0]:,.0f}\n\
Percent missing = {((orig_rows-crime_gdf.shape[0])/orig_rows)*100:,.0f}%')
Records with missing location information = 0
Percent missing = 0%

Create a unique keys

This key is combine of Latitude and Longitude of locations that crimes happened more than 1 times

In [ ]:
crime_gdf['newLoc'] = crime_gdf.geometry.x.astype(str)+ crime_gdf.geometry.y.astype(str)
In [ ]:
numlocs = crime_gdf.newLoc.value_counts().rename_axis('uniquepts').to_frame('counts')
numlocs.head()
Out[ ]:
counts
uniquepts
-8773756.9863626515302843.689697811 1031
-8775204.1397429615299498.92780969 1016
-8780658.7947918335304972.796811193 922
-8780213.5168286585305124.894422529 795
-8781104.0727550075295243.684788868 783

At some locations, crime incidents occurred in highly high rate. For example, at the location -8773756.9863626515302843.689697811 only, there were 1026 crime cases!

In [ ]:
crime_gdf.geometry.value_counts().sum()
Out[ ]:
276813
In [ ]:
# Remove duplicate
uHl = crime_gdf.drop_duplicates(subset='newLoc').reset_index()
uHl.tail()
Out[ ]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank newLoc
8540 274578 FILLMORE Buffalo Broadway Fillmore District C 42.884 Breaking & Entering NY 360290002001005 Sunday 001700 Buffalo Police are investigating this report o... 36029001700 1005 -78.837 1005 1 1 17 0 2018-10-09T08:05:00 400 Block HOWARD ST 360290001101 BURGLARY 18-2800271 17 2018-10-07T00:00:00 POINT (-8776094.696 5294332.098) lightgray -8776094.6956693085294332.098356516
8541 275321 ELLICOTT Buffalo Pratt-Willert District B 42.881 Theft of Vehicle NY 360290035021003 Thursday 001403 Buffalo Police are investigating this report o... 36029001403 1003 -78.858 1003 1 1 14.03 0 2015-04-23T06:06:00 400 Block N DIVISION ST 360290001101 UUV 15-1120636 14.02 2015-04-16T00:00:00 POINT (-8778432.405 5293876.338) lightgray -8778432.4049759675293876.338387279
8542 275512 ELLICOTT Buffalo Central District B 42.896 Theft NY 360290035021003 Sunday 007202 Buffalo Police are investigating this report o... 36029007202 1003 -78.902 1002 1 1 72.02 9 2019-09-24T22:42:00 1 Block LASALLE PKWY 360290001101 LARCENY/THEFT 07-1890708 72.02 2007-07-08T09:00:00 POINT (-8783330.463 5296155.360) lightgray -8783330.462570875296155.359902425
8543 275634 FILLMORE Buffalo Broadway Fillmore District C 42.889 Theft of Vehicle NY 360290001101002 Wednesday 001601 Buffalo Police are investigating this report o... 36029001601 1002 -78.831 2015 1 2 16.01 10 2019-09-24T22:47:00 1 Block CURTISS ST 360290001101 UUV 06-2770275 16 2006-10-04T10:00:00 POINT (-8775426.779 5295091.748) lightgray -8775426.778724555295091.747559128
8544 276104 SOUTH Buffalo Hopkins-Tifft District A 42.85 Theft of Vehicle NY 360290001103029 Wednesday 000110 Buffalo Police are investigating this report o... 36029000110 3029 -78.834 3034 3 3 1.10 0 2013-01-11T07:00:00 400 Block HOPKINS ST 360290001103 UUV 13-0100471 1.10 2012-12-19T00:00:00 POINT (-8775760.737 5289168.116) lightgray -8775760.737196935289168.115681344
In [ ]:
uHl.parent_incident_type.unique()
Out[ ]:
array(['Assault', 'Theft of Vehicle', 'Theft', 'Robbery',
       'Breaking & Entering', 'Sexual Offense', 'Homicide',
       'Other Sexual Offense', 'Sexual Assault'], dtype=object)
In [ ]:
allHl = pd.merge(uHl,numlocs,left_on='newLoc',right_on='uniquepts').drop(['newLoc'],axis=1)
print(f'Number of locations: {allHl.shape[0]}\n\
accounting for {allHl.counts.sum()} cases of crime incidents in Buffalo')
Number of locations: 8545
accounting for 276813 cases of crime incidents in Buffalo
In [ ]:
plt.hist(allHl.counts,bins=3)
Out[ ]:
(array([8.517e+03, 2.000e+01, 8.000e+00]),
 array([1.00000000e+00, 3.44333333e+02, 6.87666667e+02, 1.03100000e+03]),
 <a list of 3 Patch objects>)

Map

Wondering about locations of theft cases which is the most frequency crime type and homicide cases which is the most dangerous crime type.

In [ ]:
# Theft cases 
theftcases = allHl.loc[allHl.parent_incident_type	=='Theft'].copy()
print(f'Number of Theft cases: {theftcases.shape[0]}\n\
Accounting for {allHl.counts.sum()} cases of crime incidents in Buffalo')
Number of Theft cases: 3873
Accounting for 276813 cases of crime incidents in Buffalo
In [ ]:
# Homicide cases 
homicidecases = allHl.loc[allHl.parent_incident_type	=='Homicide'].copy()
print(f'Number of Homicide cases: {homicidecases.shape[0]}\n\
Accounting for {allHl.counts.sum()} cases of crime incidents in Buffalo')
Number of Homicide cases: 28
Accounting for 276813 cases of crime incidents in Buffalo
In [ ]:
maxcir = 60
maxcnt = theftcases.counts.max()
theftcases['radius']=(theftcases.counts/maxcnt*maxcir)
theftcases['radius']=theftcases['radius'].astype(float).round().astype(int)
theftcases.head()
Out[ ]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank counts radius
3 3 MASTEN Buffalo Masten Park District C 42.91 Theft NY 360290033022004 Saturday 003302 Buffalo Police are investigating this report o... 36029003302 2004 -78.844 2004 2 2 33.02 19 2022-04-16T19:24:48 900 Block HUMBOLDT PW 360290033022 LARCENY/THEFT 22-1060730 33.02 2022-04-16T10:23:48 POINT (-8776873.932 5298282.947) lightgray 22 1
4 4 ELLICOTT Buffalo Pratt-Willert District B 42.895 Theft NY 360290025023000 Monday 002502 Buffalo Police are investigating this report o... 36029002502 3000 -78.857 2000 3 2 25.02 15 2022-04-18T15:18:44 200 Block CHERRY ST 360290025023 LARCENY/THEFT 22-1080545 25.02 2022-04-16T20:00:44 POINT (-8778321.085 5296003.408) lightgray 20 1
8 8 LOVEJOY Buffalo Kenfield District E 42.924 Theft NY 360290041003004 Saturday 004100 Buffalo Police are investigating this report o... 36029004100 3004 -78.813 3004 3 3 41 16 2022-04-16T16:17:21 2500 Block BAILEY AV 360290041003 LARCENY/THEFT 22-1060582 41 2022-04-16T16:16:21 POINT (-8773423.028 5300411.017) lightgray 90 5
9 9 UNIVERSITY Buffalo Kenfield District E 42.927 Theft NY 360290041001008 Saturday 004100 Buffalo Police are investigating this report o... 36029004100 1008 -78.818 1008 1 1 41 23 2022-04-16T23:54:48 0 Block ALMA AV 360290041001 LARCENY/THEFT 22-1060928 41 2022-04-16T23:53:48 POINT (-8773979.625 5300867.095) lightgray 77 4
10 10 NIAGARA Buffalo West Side District B 42.91 Theft NY 360290070001003 Saturday 007000 Buffalo Police are investigating this report o... 36029007000 1003 -78.899 1003 1 1 70 3 2022-04-16T03:03:30 900 Block NIAGARA ST 360290070001 LARCENY/THEFT 22-1060102 70 2022-04-16T03:02:30 POINT (-8782996.504 5298282.947) lightgray 217 13
In [ ]:
maxcir = 60
maxcnt = homicidecases.counts.max()
homicidecases['radius']=(homicidecases.counts/maxcnt*maxcir)
homicidecases['radius']=homicidecases['radius'].astype(float).round().astype(int)
homicidecases.head()
Out[ ]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank counts radius
143 150 UNIVERSITY Buffalo Kenfield District E 42.928 Homicide NY 360290041001009 Wednesday 004100 Buffalo Police are investigating this report o... 36029004100 1009 -78.814 1000 1 1 41 13 2022-04-20T13:15:15 0 Block WESTON AV 360290041001 MURDER 22-1100422 41 2022-04-20T13:15:15 POINT (-8773534.347 5301019.126) deepPink 73 38
166 176 ELLICOTT Buffalo Masten Park District C 42.905 Homicide NY 360290033024003 Sunday 003302 Buffalo Police are investigating this report o... 36029003302 4003 -78.851 4003 4 4 33.02 15 2022-04-24T15:50:11 400 Block DODGE ST 360290033024 MURDER 22-1140684 33.02 2022-04-24T15:50:11 POINT (-8777653.169 5297523.039) deepPink 22 12
853 979 LOVEJOY Buffalo Schiller Park District E 42.917 Homicide NY 360290046012000 Thursday 003700 Buffalo Police are investigating this report o... 36029003700 2000 -78.804 2000 2 2 37 18 2019-09-24T22:39:00 ROGERS AV & GENESSE ST 360290001102 MURDER 07-3470896 37 2007-12-13T18:23:00 POINT (-8772421.152 5299346.922) deepPink 114 60
1102 1332 MASTEN Buffalo Delavan Grider District E 42.915 Homicide NY 360290002001005 Wednesday 003400 Buffalo Police are investigating this report o... 36029003400 1005 -78.826 1005 1 1 34 15 2019-09-24T16:05:00 E FERRY ST & STEVENS AV 360290001101 MURDER 12-2000574 34 2012-07-18T15:36:00 POINT (-8774870.181 5299042.916) deepPink 48 25
1768 2350 FILLMORE Buffalo Genesee-Moselle District C 42.898 Homicide NY 360290028012002 Monday 002801 Buffalo Police are investigating this report o... 36029002801 2002 -78.817 3003 2 3 28.01 2 2021-08-23T02:50:43 0 Block HIRSCHBECK ST 360290028012 MURDER 21-2350107 28 2021-08-23T02:50:43 POINT (-8773868.306 5296459.271) deepPink 48 25
In [ ]:
theftcases.to_crs('epsg:3857',inplace=True)
homicidecases.to_crs('epsg:3857',inplace=True)
output_file("/content/CrimePointFrequencyMaps.html",
            title="Locations with Frequency Crime Incidents in Buffalo")

f1 = figure(title = "Location of Theft cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs)
f2 = figure(title = "Location of Homicide cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs,
            x_range=f1.x_range,y_range=f1.y_range)

f1.add_tile(tileProvider)
f1.title.text_font_style = 'italic'
f1.title.text_font_size = '14pt'
f1.axis.visible=False 

f2.add_tile(tileProvider)
f2.title.text_font_style = 'italic'
f2.title.text_font_size = '14pt'
f2.axis.visible=False 

point_source_1 = GeoJSONDataSource(geojson=theftcases.to_json())
point_source_2 = GeoJSONDataSource(geojson=homicidecases.to_json())


Circle1=f1.circle('x','y',size='radius',fill_color='blue',line_color='blue',fill_alpha=0.5,source=point_source_1)
Circle2=f2.circle('x','y',size='radius',fill_color='red',line_color='red',fill_alpha=0.5,source=point_source_2)


c_hover= HoverTool(renderers=[Circle1])
c_hover.point_policy = "follow_mouse"
c_hover.tooltips=[("Address","@address_1," "@neighborhood_1"),
                  ("   " , "    "),
                  ("Number of Cases","@counts")]

f1.add_tools(c_hover)

c2_hover= HoverTool(renderers=[Circle2])
c2_hover.point_policy = "follow_mouse"
c2_hover.tooltips=[("Address","@address_1," "@neighborhood_1"),
                  ("   " , "    "),
                  ("Number of Cases","@counts")]

f2.add_tools(c2_hover)

heading = Div(text="""<h1>Point Frequency Maps</h1>\
<p> The two maps below show locations and frequencies of theft and homicide crime cases in Buffalo.\
On the left, proportional point symbols show locations of theft cases and on the right are locations of homicide.</p>\
<p> Use the tools to the right of each map to pan, zoom, etc... \
Hover over a property to see the address and number of cases.</p> \
<p><b><i>Data Source</i></b> =<a href = https://data.cityofnewyork.us/Housing-Development/Housing-Litigations/59kj-x8nc target='_blank'>NYC Open Data.</a></p>.\
<p style="font-size:9px;">Maps created 4/10/2022 by Nguyet Que T. Tran.</p>""", sizing_mode="stretch_both")

layout = column(heading, row(f1,f2),sizing_mode='stretch_both',margin=(5,5,5,5))
show(layout)

The map showing the location of crime incidents that were occured. Each point is geocoded to the actual location of an address/house/store.

The size of the symbol at each point location represents the number of crime that were happened at the location. The higher cases, the larger cicle size.

Which Council Districts is the most dangerous place?

Point Distribution Map

In [ ]:
# Buffalo Council Districts dataset
api_url="https://data.buffalony.gov/resource/u5mx-ugvy.geojson"
cd_gdf=gpd.read_file(api_url)
cd_gdf.tail()
Out[ ]:
dist_id dist_name shape_leng objectid_1 geometry
5 9 NIAGARA 0.14931438999999999 9 MULTIPOLYGON (((-78.89588 42.92591, -78.89457 ...
6 7 UNIVERSITY 0.18683203000000001 8 MULTIPOLYGON (((-78.80780 42.95894, -78.80774 ...
7 8 MASTEN 0.20357618 10 MULTIPOLYGON (((-78.82813 42.94033, -78.82812 ...
8 4 SOUTH 0.51072598000000002 5 MULTIPOLYGON (((-78.88394 42.87750, -78.88360 ...
9 2 NORTH 0.20635665 3 MULTIPOLYGON (((-78.89236 42.96107, -78.89061 ...
In [ ]:
crime_gdf = crime_gdf.to_crs('epsg:3857')
cd_gdf = cd_gdf.to_crs('epsg:3857')
In [ ]:
joindf = gpd.sjoin(crime_gdf,cd_gdf,how='inner',op='intersects')
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2822: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead.
  if self.run_code(code, result):
In [ ]:
joindf.tail()
Out[ ]:
council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank newLoc index_right dist_id dist_name shape_leng objectid_1
264587 FILLMORE Buffalo Central District B 42.881 Assault NY 360290036001010 Monday 007202 Buffalo Police are investigating this report o... 36029007202 1010 -78.889 1012 1 1 72.02 4 2016-01-05T04:44:00 300 Block ERIE ST 360290001101 ASSAULT 16-0040085 72.02 2016-01-04T04:20:00 POINT (-8781883.309 5293876.338) blue -8781883.3091905585293876.338387279 2 0 UnAssigned 0.47838895999999997 1
264673 SOUTH Buffalo Seneca-Cazenovia District A 42.858 Breaking & Entering NY 360290035022002 Sunday 001100 Buffalo Police are investigating this report o... 36029001100 2002 -78.819 2002 2 2 11 10 2011-05-23T06:00:00 1 Block RIVERVIEW PL 360290001102 BURGLARY 11-1420386 11 2011-05-22T10:33:00 POINT (-8774090.945 5290382.915) lightgray -8774090.944835035290382.914579141 2 0 UnAssigned 0.47838895999999997 1
266008 SOUTH Buffalo Seneca-Cazenovia District A 42.858 Assault NY 360290170002003 Monday 001100 Buffalo Police are investigating this report o... 36029001100 2003 -78.819 2003 2 2 11 22 2019-09-24T22:35:00 1 Block RIVERVIEW PL 360290001102 ASSAULT 08-1891178 11 2008-07-07T22:15:00 POINT (-8774090.945 5290382.915) blue -8774090.944835035290382.914579141 2 0 UnAssigned 0.47838895999999997 1
274037 SOUTH Buffalo Seneca-Cazenovia District A 42.855 Assault NY 360290002004003 Saturday 001000 Buffalo Police are investigating this report o... 36029001000 4003 -78.814 4003 4 4 10 20 2009-06-07T01:00:00 2000 Block SENECA ST 360290002004 ASSAULT 09-1571016 10 2009-06-06T20:32:00 POINT (-8773534.347 5289927.347) blue -8773534.3473810635289927.346550928 2 0 UnAssigned 0.47838895999999997 1
276245 FILLMORE Buffalo Central District A 42.862 Theft NY 360290015001025 Sunday 000500 Buffalo Police are investigating this report o... 36029000500 1025 -78.867 1034 1 1 5 0 2018-09-02T23:57:00 1 Block W CHIPPEWA ST 360290001101 LARCENY/THEFT 18-2450708 5 2018-09-02T00:00:00 POINT (-8779434.280 5290990.373) lightgray -8779434.2803931075290990.373048019 2 0 UnAssigned 0.47838895999999997 1
In [ ]:
joindf['council_district']=joindf.council_district.astype(str)
ct = joindf.copy()
ct = ct.council_district.groupby(joindf['council_district']).count().sort_values(ascending=False)
ctdf=ct.to_frame(name='counts').reset_index()
In [ ]:
ctdf.tail()
Out[ ]:
council_district counts
4 FILLMORE 31534
5 MASTEN 31174
6 NIAGARA 30533
7 DELAWARE 18548
8 SOUTH 17281
In [ ]:
nCases = pd.merge(cd_gdf,ctdf,left_on="dist_name",right_on="council_district")
nCases['centroids'] =nCases['geometry'].centroid
nCases = nCases.set_geometry('centroids')
In [ ]:
maxcir = 60
maxcnt = nCases.counts.max()
nCases['radius']=(nCases.counts/maxcnt*maxcir)
nCases['radius']=nCases['radius'].astype(float).round().astype(int)
nCases.head()
Out[ ]:
dist_id dist_name shape_leng objectid_1 geometry council_district counts centroids radius
0 5 DELAWARE 0.19087878 6 MULTIPOLYGON (((-8778099.466 5305679.993, -877... DELAWARE 18548 POINT (-8778543.920 5302638.853) 24
1 3 FILLMORE 0.42294144 4 MULTIPOLYGON (((-8773501.858 5299351.099, -877... FILLMORE 31534 POINT (-8776864.657 5294510.894) 42
2 1 ELLICOTT 0.32953199 2 MULTIPOLYGON (((-8778908.981 5299381.803, -877... ELLICOTT 45443 POINT (-8779145.197 5296280.698) 60
3 6 LOVEJOY 0.35163747000000001 7 MULTIPOLYGON (((-8773498.518 5300355.298, -877... LOVEJOY 32546 POINT (-8773355.009 5294709.224) 43
4 9 NIAGARA 0.14931438999999999 9 MULTIPOLYGON (((-8782649.206 5300701.761, -878... NIAGARA 30533 POINT (-8781916.166 5298678.305) 40
In [ ]:
output_file("/content/CrimeDistributionMaps.html",
            title="Crime Incidents bu Council Districts in Buffalo")
f1 = figure(title = "Crime incident cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs)

f1.add_tile(tileProvider)
f1.title.text_font_style = 'italic'
f1.title.text_font_size = '14pt'
f1.axis.visible=False 

TA20 = nCases.drop('geometry',axis=1).copy()


point_source_1 = GeoJSONDataSource(geojson=TA20.to_json())
poly_source = GeoJSONDataSource(geojson=cd_gdf.to_json())

Circle1=f1.circle('x','y',size='radius',fill_color='blue',line_color='blue',fill_alpha=0.5,source=point_source_1)
areas = f1.patches('xs','ys',source=poly_source,name="Council Districts",fill_color=None,fill_alpha=0.6,line_color="black",line_width=0.5)

c_hover= HoverTool(renderers=[Circle1])
c_hover.point_policy = "follow_mouse"
c_hover.tooltips=[
                  ("Council Districts","@dist_name"),
                  ("Number of Cases","@counts")]

f1.add_tools(c_hover)



heading = Div(text="""<h1>Point Distribution Map</h1>\
<p> The map below show locations and distribution of crime incident cases in Buffalo.\
<p> Use the tools to the right of map to pan, zoom, etc... \
Hover over a property to see the address and number of cases.</p> \
<p><b><i>Data Source</i></b> =<a href = https://data.cityofnewyork.us/Housing-Development/Housing-Litigations/59kj-x8nc target='_blank'>NYC Open Data.</a></p>.\
<p style="font-size:9px;">Maps created 4/10/2022 by Nguyet Que T. Tran.</p>""", sizing_mode="stretch_both")

layout = column(heading, row(f1),sizing_mode='stretch_both',margin=(5,5,5,5))
show(layout)

The map showing the number of confirmed crime cases by Buffalo Council Districts. The center of each council districts polygon boundary is used to represent the total number of confirmed crime cases within each council districts. The higher the number, the larger the circle size.

In summary, Ellicott is the council district that have the highest number of crime cases - 45,318 cases. While South council district have the lowest number of cases - 17,260 cases. However, almost haft of South council district area is parks/places without human physical addresses where none crime cases recorded in this dataset. So that we cannot conclude that South is safest council district in Buffalo.

Moreover, Delaware council districh have the second lowest number of cases - 18,526. And again, its area have a big park.

So that, in oder to tell the dangerous level of council districts, we will need to draw a point frequency map of all crime cases at duplicated locations.

Point Frequency Map

In [ ]:
Allcases = allHl.loc[allHl.parent_incident_type	!= None ].copy()
Allcases.to_crs('epsg:3857',inplace=True)
Allcases
# 'Assault', 'Theft of Vehicle', 'Theft', 'Robbery',
#        'Breaking & Entering', 'Sexual Offense', 'Homicide',
#        'Other Sexual Offense', 'Sexual Assault'
Out[ ]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank counts
0 0 NORTH Buffalo Riverside District D 42.956 Assault NY 360290058023003 Monday 005802 Buffalo Police are investigating this report o... 36029005802 3003 -78.905 3003 3 3 58.02 3 2022-04-18T03:43:55 0 Block CONDON AV 360290058023 ASSAULT 22-1080081 58.02 2022-04-18T02:57:55 POINT (-8783664.421 5305276.995) blue 104
1 1 UNIVERSITY Buffalo University Heights District E 42.953 Theft of Vehicle NY 360290046013001 Monday 004601 Buffalo Police are investigating this report o... 36029004601 3001 -78.828 4001 3 4 46.01 10 2022-04-18T10:31:11 0 Block MERRIMAC ST 360290046013 UUV 22-1080284 46.01 2022-04-18T10:31:11 POINT (-8775092.820 5304820.702) lightgray 59
2 2 FILLMORE Buffalo Broadway Fillmore District C 42.897 Assault NY 360290027031012 Saturday 002703 Buffalo Police are investigating this report o... 36029002703 1012 -78.838 4004 1 4 27.03 21 2022-04-16T21:36:42 100 Block BECK ST 360290027031 ASSAULT 22-1060834 27.02 2022-04-16T21:36:42 POINT (-8776206.015 5296307.314) blue 104
3 3 MASTEN Buffalo Masten Park District C 42.91 Theft NY 360290033022004 Saturday 003302 Buffalo Police are investigating this report o... 36029003302 2004 -78.844 2004 2 2 33.02 19 2022-04-16T19:24:48 900 Block HUMBOLDT PW 360290033022 LARCENY/THEFT 22-1060730 33.02 2022-04-16T10:23:48 POINT (-8776873.932 5298282.947) lightgray 22
4 4 ELLICOTT Buffalo Pratt-Willert District B 42.895 Theft NY 360290025023000 Monday 002502 Buffalo Police are investigating this report o... 36029002502 3000 -78.857 2000 3 2 25.02 15 2022-04-18T15:18:44 200 Block CHERRY ST 360290025023 LARCENY/THEFT 22-1080545 25.02 2022-04-16T20:00:44 POINT (-8778321.085 5296003.408) lightgray 20
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8540 274578 FILLMORE Buffalo Broadway Fillmore District C 42.884 Breaking & Entering NY 360290002001005 Sunday 001700 Buffalo Police are investigating this report o... 36029001700 1005 -78.837 1005 1 1 17 0 2018-10-09T08:05:00 400 Block HOWARD ST 360290001101 BURGLARY 18-2800271 17 2018-10-07T00:00:00 POINT (-8776094.696 5294332.098) lightgray 1
8541 275321 ELLICOTT Buffalo Pratt-Willert District B 42.881 Theft of Vehicle NY 360290035021003 Thursday 001403 Buffalo Police are investigating this report o... 36029001403 1003 -78.858 1003 1 1 14.03 0 2015-04-23T06:06:00 400 Block N DIVISION ST 360290001101 UUV 15-1120636 14.02 2015-04-16T00:00:00 POINT (-8778432.405 5293876.338) lightgray 1
8542 275512 ELLICOTT Buffalo Central District B 42.896 Theft NY 360290035021003 Sunday 007202 Buffalo Police are investigating this report o... 36029007202 1003 -78.902 1002 1 1 72.02 9 2019-09-24T22:42:00 1 Block LASALLE PKWY 360290001101 LARCENY/THEFT 07-1890708 72.02 2007-07-08T09:00:00 POINT (-8783330.463 5296155.360) lightgray 1
8543 275634 FILLMORE Buffalo Broadway Fillmore District C 42.889 Theft of Vehicle NY 360290001101002 Wednesday 001601 Buffalo Police are investigating this report o... 36029001601 1002 -78.831 2015 1 2 16.01 10 2019-09-24T22:47:00 1 Block CURTISS ST 360290001101 UUV 06-2770275 16 2006-10-04T10:00:00 POINT (-8775426.779 5295091.748) lightgray 1
8544 276104 SOUTH Buffalo Hopkins-Tifft District A 42.85 Theft of Vehicle NY 360290001103029 Wednesday 000110 Buffalo Police are investigating this report o... 36029000110 3029 -78.834 3034 3 3 1.10 0 2013-01-11T07:00:00 400 Block HOPKINS ST 360290001103 UUV 13-0100471 1.10 2012-12-19T00:00:00 POINT (-8775760.737 5289168.116) lightgray 1

8545 rows × 30 columns

In [ ]:
maxcir = 60
maxcnt = Allcases.counts.max()
Allcases['radius']=(Allcases.counts/maxcnt*maxcir)
Allcases['radius']=Allcases['radius'].astype(float).round().astype(int)
Allcases.head()
Out[ ]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank counts radius
0 0 NORTH Buffalo Riverside District D 42.956 Assault NY 360290058023003 Monday 005802 Buffalo Police are investigating this report o... 36029005802 3003 -78.905 3003 3 3 58.02 3 2022-04-18T03:43:55 0 Block CONDON AV 360290058023 ASSAULT 22-1080081 58.02 2022-04-18T02:57:55 POINT (-8783664.421 5305276.995) blue 104 6
1 1 UNIVERSITY Buffalo University Heights District E 42.953 Theft of Vehicle NY 360290046013001 Monday 004601 Buffalo Police are investigating this report o... 36029004601 3001 -78.828 4001 3 4 46.01 10 2022-04-18T10:31:11 0 Block MERRIMAC ST 360290046013 UUV 22-1080284 46.01 2022-04-18T10:31:11 POINT (-8775092.820 5304820.702) lightgray 59 3
2 2 FILLMORE Buffalo Broadway Fillmore District C 42.897 Assault NY 360290027031012 Saturday 002703 Buffalo Police are investigating this report o... 36029002703 1012 -78.838 4004 1 4 27.03 21 2022-04-16T21:36:42 100 Block BECK ST 360290027031 ASSAULT 22-1060834 27.02 2022-04-16T21:36:42 POINT (-8776206.015 5296307.314) blue 104 6
3 3 MASTEN Buffalo Masten Park District C 42.91 Theft NY 360290033022004 Saturday 003302 Buffalo Police are investigating this report o... 36029003302 2004 -78.844 2004 2 2 33.02 19 2022-04-16T19:24:48 900 Block HUMBOLDT PW 360290033022 LARCENY/THEFT 22-1060730 33.02 2022-04-16T10:23:48 POINT (-8776873.932 5298282.947) lightgray 22 1
4 4 ELLICOTT Buffalo Pratt-Willert District B 42.895 Theft NY 360290025023000 Monday 002502 Buffalo Police are investigating this report o... 36029002502 3000 -78.857 2000 3 2 25.02 15 2022-04-18T15:18:44 200 Block CHERRY ST 360290025023 LARCENY/THEFT 22-1080545 25.02 2022-04-16T20:00:44 POINT (-8778321.085 5296003.408) lightgray 20 1
In [ ]:
from fiona.env import NullContextManager

Allcases.to_crs('epsg:3857',inplace=True)

f1 = figure(title = "Location of crime cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs)

f1.add_tile(tileProvider)
f1.title.text_font_style = 'italic'
f1.title.text_font_size = '14pt'
f1.axis.visible=False 


point_source_1 = GeoJSONDataSource(geojson=Allcases.to_json())
poly_source = GeoJSONDataSource(geojson=cd_gdf.to_json())

Circle1=f1.circle('x','y',size='radius',fill_color='blue',line_color='blue',fill_alpha=0.5,source=point_source_1)
areas = f1.patches('xs','ys',source=poly_source,name="Council Districts",fill_color=None,fill_alpha=0.6,line_color="red",line_width=0.9)

c_hover= HoverTool(renderers=[Circle1])
c_hover.point_policy = "follow_mouse"
c_hover.tooltips=[("Address","@address_1," "@council_district"),
                  ("   " , "    "),
                  ("Number of Cases","@counts")]

f1.add_tools(c_hover)


heading = Div(text="""<h1>All Crimes Point Frequency Map</h1>\
<p> The map below show locations and frequencies of all crime cases in Buffalo by council district.\
<p> Use the tools to the right of each map to pan, zoom, etc... \
Hover over a property to see the address and number of cases.</p> \
<p><b><i>Data Source</i></b> =<a href = https://data.cityofnewyork.us/Housing-Development/Housing-Litigations/59kj-x8nc target='_blank'>NYC Open Data.</a></p>.\
<p style="font-size:9px;">Maps created 4/10/2022 by Nguyet Que T. Tran.</p>""", sizing_mode="stretch_both")

layout = column(heading, row(f1),sizing_mode='stretch_both',margin=(5,5,5,5))
show(layout)

Conclusion

Base on the map above, we can see the frequency and level of crimes that are happened.

According to the size and frequency of circles, we can conclude that Ellicott is the most dangerous council district in Buffalo.

And Delaware seems safer than South with smaller sizes of circles. More over, combine with Homicide map, there was no homicide case occurred in Delaware while in South there were location that has 39 homicide cases. So, Delaware is the safest council district in Buffalo.